Conditions on abruptness in a gradient-ascent Maximum Entropy learner
نویسنده
چکیده
When does a gradual learning rule translate into gradual learning performance? This paper studies a gradient-ascent Maximum Entropy phonotactic learner, as applied to twoalternative forced-choice performance expressed as log-odds. The main result is that slow initial performance cannot accelerate later if the initial weights are near zero, but can if they are not. Stated another way, abruptness in this learner is an effect of transfer, either from Universal Grammar in the form of an initial weighting, or from previous learning in the form of an acquired weighting.
منابع مشابه
Updating ACO Pheromones Using Stochastic Gradient Ascent and Cross-Entropy Methods
In this paper we introduce two systematic approaches, based on the stochastic gradient ascent algorithm and the cross-entropy method, for deriving the pheromone update rules in the Ant colony optimization metaheuristic. We discuss the relationships between the two methods as well as connections to the update rules previously proposed in the literature.
متن کاملProbability Density Estimation Using Entropy Maximization
We propose a method for estimating probability density functions and conditional density functions by training on data produced by such distributions. The algorithm employs new stochastic variables that amount to coding of the input, using a principle of entropy maximization. It is shown to be closely related to the maximum likelihood approach. The encoding step of the algorithm provides an est...
متن کاملNotes on CG and LM-BFGS Optimization of Logistic Regression
It has been recognized that the typical iterative scaling methods [?, ?] used to train logistic regression classification models (maximum entropy models) are quite slow. Goodman has suggested the use of a component-wise optimization of GIS [?], which he has measured to be faster on many tasks. However, in general, the iterative scaling methods pale in comparison to conjugate gradient ascent (fo...
متن کاملMaximum within-cluster association
This paper addresses a new method and aspect of information-theoretic clustering where we exploits the minimum entropy principle and the quadratic distance measure between probability densities. We present a new minimum entropy objective function which leads to the maximization of within-cluster association. A simple implementation using the gradient ascent method is given. In addition, we show...
متن کاملNaive Parameter Learning for Optimality Theory - The Hidden Structure Problem
There exist a number of provably correct learning algorithms for Optimality Theory and closely related theories. These include Constraint Demotion (CD; Tesar 1995, et seq.), a family of algorithms for classic OT. For Harmonic Grammar (Legendre, Miyata and Smolensky 1990; Smolensky and Legendre 2006) and related theories (e.g. maximum entropy), there is Stochastic Gradient Ascent (SGA; Soderstro...
متن کامل